System and a method for the detection of multiple number-plates of moving cars in a series of 2-d images

ABSTRACT

A stand-alone computer-camera system capable of extracting car-plate information. This is achieved by using an on-board computer in order to analyze the video stream recorded by the camera sensor, and can be used with any type of camera sensor. The system features specific characteristics making it extremely fast and able to catch plates of cars moving at high-speed. The special algorithms incorporated in this system, are specially implemented, in order to be able to be ported on an embedded computer system, which has usually lower capabilities in terms of processing power and memory than a general-purpose computer.

BACKGROUND

1. Field

An exemplary embodiment of this invention relates to the field of the detection of Automatic Number Plate Recognition (ANPR) systems. More specifically an exemplary embodiment of the invention relates to a method and a system capable of extracting the location of the car number-plate from a series of 2-D images, using a device equipped with a camera of any kind.

2. Description of the Related Art

There are many known devices that are able to detect the location of the number plate of a car and then recognize the plate-number producing at the output an alphanumeric text corresponding to the characters of the plate number.

There are many approaches for performing car-plate detection and recognition. Most of these systems are based on a Personal Computer to carry out the required processing tasks. In such systems a video digitizer samples the camera sensor and a PC, which runs the car-plate detection and recognition software, then processes the data. However these implementations are not easily portable, are bulky in size, require special power-supply and are difficult to install on site.

When ANPR systems are used for recognizing plates of moving cars in highway roads, another important characteristic is the recognition speed. In order to be able to catch fast-moving cars, the plate detector must be able to analyze very fast every frame in the video sequence. The detection speed depends on the algorithm and the processor speed. Today's common processors or even dedicated digital signal processor (DSP) devices are not able to deliver the required performance.

SUMMARY

An exemplary embodiment of the invention refers to a stand-alone computer-camera system capable of extracting car-plates. This is achieved by using an on-board computer in order to analyze the video stream recorded by the camera sensor, and can be used with any type of camera sensor. The system features specific characteristics making it extremely fast and able to catch plates of cars moving at high-speed.

The special algorithms incorporated in this system, are specially implemented, in order to be able to be ported on an embedded computer system, which has usually lower capabilities in terms of processing power and memory than a general-purpose computer.

BRIEF DESCRIPTION OF THE DRAWINGS

The exemplary embodiments of the invention will be described in detail, with reference to the following figures, wherein:

FIG. 1 illustrates an exemplary car plate location information extraction system;

FIG. 2 illustrates an exemplary Car Plate Detection Device through which the system detects car-plates and extracts the information of the coordinates;

FIG. 3 illustrates how moving pixels are identified;

FIG. 4 illustrates how each pixel in the background model is modeled through the use of the corresponding pixels of some consequent frames;

FIG. 5 illustrates a flow-chart showing an exemplary method for detecting a car-plate;

FIG. 6 illustrates an exemplary moving portion of the video frame;

FIG. 7 illustrates an exemplary local thresholding approach which is used which employs threshold adaptation using feedback from the system output and more specifically from the Digit Segmentation unit;

FIG. 8 illustrates an exemplary morphological operation;

FIG. 9 illustrates an exemplary run-length encoding technique;

FIG. 10 is a flowchart illustrating an exemplary run-length encoding technique;

FIG. 11 illustrates an exemplary technique for the initial labeling and propagation of labels;

FIG. 12 illustrates an exemplary application of a labeling algorithm;

FIG. 13 is a flowchart illustrating an exemplary conflict-resolving algorithm;

FIG. 14 illustrates an exemplary region feature extraction technique;

FIG. 15 illustrates an exemplary pattern classification scheme is used for region classification;

FIG. 16 illustrates how digits in a binary plate image appear as coherent regions;

FIG. 17 is a flowchart illustrating an exemplary technique for a background-foreground inversion for the regions detected as plates; and

FIG. 18 is a flowchart illustrating an exemplary technique for new inverted runs being de-coded in binary image format.

DETAILED DESCRIPTION

In the current description we refer to the detection of multiple car-plates from a video sequence and the extraction of the coordinates of each plate. In accordance with an exemplary embodiment of the present invention, location information of car-plates is extracted from an image frame sequence by using a system like the one shown in FIG. 1. This system uses a camera sensor (11 in FIG. 1) which captures the video frames (12 in FIG. 1), stores the most recent frame in a memory (13 in FIG. 1) and then processes it with a car-plate detection device (14 in FIG. 1), comprised by a storage section (15 in FIG. 1) and a processing section (16 in FIG. 1) in order to extract carplates.

The Car Plate Detection Device through which the system detects car-plates and extracts the information of the coordinates is shown in FIG. 2.

This exemplary system functions as follows: First two consecutive frames I₁ and I_(i+1), (12 in FIG. 1) are input into the Image Data Input Unit (221 in FIG. 2) from the Storage Memory (13 in FIG. 1) and are temporarily stored into the Input Image Data memory (21 in FIG. 2). The data are then fed into the Moving Object Detection unit (222 in FIG. 2), which detects the parts of video frames corresponding to moving objects at any time and stores the corresponding parts of the video frames into the moving object image data memory (22 in FIG. 2). Data from the Moving Object Image Data memory are then fed into the Automatic Threshold Adaptation unit (223 in FIG. 2), which calculates the optimal local binarization threshold parameter. This unit takes also input from the Digit Segmentation Unit (229 in FIG. 2) and from the Number of Digits Input unit (233 in FIG. 2). This unit then feeds the threshold parameter into the Image Binarization unit (224 in FIG. 2), to binarize the moving object image data, which it gets from the moving object image data memory (22 in FIG. 2) and stores in the Binary Image Data memory (23 in FIG. 2). The Image Binarization unit (224 in FIG. 2) can optionally get data from the user through the Threshold Input Unit (231 in FIG. 2), or through the Automatic Threshold Calculation unit (232 in FIG. 2).

Data from Binary Image Data memory are then fed to the Morphological Filtering unit (225 in FIG. 2), for filtering of unwanted noise and storing the filtered image data in the filtered Binary Image Data memory (24 in FIG. 2). Data from this memory are input to the Connected Component Analysis (CCA) unit (226 in FIG. 2), which analyzes the binary data to find blocks of pixels corresponding to regions (Blobs) and then stores the results in the Region Data Memory (25 in FIG. 2).

The next step is the classification of the blobs in order to identify the car plates. This procedure takes place into the Region Classification unit (228 in FIG. 2) which analyzes the data previously stored in the Region Data memory (25 in FIG. 2), using classification criteria defined by the user through the Classification Criteria Trimming unit (227 in FIG. 2). The output of the Region Classification unit, which is the plate coordinates are stored in the detected plate coordinates memory (26 in FIG. 2).

The results of this extraction are then fed into the Plate Output unit (234 in FIG. 2), which outputs the plates to the system output when the Automatic Threshold Adaptation unit (223 in FIG. 2) indicates that the right number of digits have been detected.

A final step of processing concerns the segmentation of the plate digits that exist in the detected plate. This procedure takes place within the Digit Extraction Unit (229 in FIG. 2).

The results of this extraction are then fed into the Digit Output unit (230 in FIG. 2), which outputs the digits to the system output when the Automatic Threshold Adaptation unit (223 in FIG. 2) indicates that the right number of digits have been detected.

In the following paragraphs the above-referred units are explained analytically.

Moving Object Detection Unit (227 in FIG. 2).

This unit detects the motion of pixels from consecutive video frames. The target is to identify one or more moving cars in a steady background as viewed by the camera. The background corresponds to the view of the camera when no car is present and nothing else moves. However this complete absence of motion rarely occurs under real world conditions and therefore the background is instead modeled according to a background model. The background model is actually an image obtained using some statistical methodology, which incorporates any minor differences that may occur due to slight variations in lighting conditions, electronic noise from some camera sensor, or due to some minor motions inherent in the video scene (e.g. tree leaves moving due to a blowing wind).

Given the background model, any moving pixels can be identified in a video frame by subtracting the background model from this particular frame. Therefore referring to FIG. 3, the moving pixels (32 and 34 in FIG. 3) corresponding to a moving object within this video sequence are identified by subtracting the background model image (33 in FIG. 3) from the current frame (31 in FIG. 3).

As the motion in the current frame becomes more intense, more pixels are different from the background model.

The calculation of the background model can be achieved using statistical techniques: Each pixel in the background model is modeled through the use of the corresponding pixels of some consequent frames as shown in FIG. 4. More specifically, each pixel PBM_(k) in the background model (43 in FIG. 4) results in a statistical measure of the central tendency of the pixel population which is constituted by the pixels P_(k1 . . . N) (41 in FIG. 4) in the consecutive video sequence frames I₁ . . . I_(N) (42 in FIG. 4) having the same coordinates as PBM_(k). Possible statistical measures of the central tendency include the mean value, the median value and the mode value. However in order to be able to use this central tendency measure, a number of consecutive video frames must be stored in a buffer memory and this constitutes a significant problem in the case that the system is targeted to be implemented as an embedded system. In an embedded system the memory is usually limited and therefore this type of implementation is not feasible. The calculation of the mean value may be an exception to this problem since it is possible to be calculated as the running mean. The running mean value is calculated progressively as follows:

PBM _(k)=0.5PBM _(k)+0.5P _(k) ,i=1 . . . N  (1a)

In an exemplary embodiment, a weighted average measure is used described by the following relation:

PBM _(k) =aPBM _(k)+(1−a)P _(ki) ,i=1 . . . N  (2a)

The difference between equations (1a) and (2a) is the parameter α, which in the case of running average takes on the value 0.5. Values of α smaller than 0.5, make the system to be more robust to background changes. In this case the background model change faster or equivalently the system has limited memory and is able to forget its history. As parameter α gets smaller, the background model changes faster.

More specifically, the procedure of detecting a car-plate is the following: As a first step, the background model BM is calculated. In the first iteration, the background model is initialized with a zero value for every pixel (52 in FIG. 5). Then, the background model is calculated (53 in FIG. 5) using eq. 2a.

The background model is then subtracted from the current frame (54 in FIG. 5). Finally the absolute value of the difference is checked for every pixel against a threshold TH and the corresponding pixel is categorized as background if D_(k)<TH and as moving object if D_(k)>TH (56 and 57 in FIG. 5). The parameter TH plays the role of motion sensitivity. The larger the parameter TH the less sensitive the system will be to small motions. This is a very useful feature since it controls the response of the system in noisy conditions where there are small motions distributed across the entire frame area, corresponding to conditions such as rain, wind etc.

As a final step, the system outputs the coordinates of the moving object using the following procedure: First all the coordinates of the pixels characterized as <<moving>> are sorted (58 in FIG. 5). From this procedure the minimum and maximum coordinates in the x-direction (x_min and x_max) as well as the minimum and maximum coordinates in the y-direction (y_min and y_max) are computed. Then, rectangle Q₁Q₂Q₃Q₄ is formed (62 in FIG. 6), representing the moving portion of the video frame (61 in FIG. 6), with the corner points Q₁ having the following coordinates: Q1=(x_min, y_min) Q2=(x_max, y_min), Q3=(x_min, y_max) Q4=(x_max, y_max).

Image Binarization Unit (224 in FIG. 2)

The Binarization unit (224 in FIG. 2) focuses on the binarization of the input image. A binarization procedure is considered the formation of a new image having pixels with only two possible values. In the context of the current invention these values can be either 0 (black) or 255 (white).

The binarization procedure employs the comparison of each pixel in the image with a threshold value TH_bin and then forms a new binary image having a one to one correspondence with the initial image described as follows: Pixels in the original image with a value greater than TH_bin correspond to pixels with value 255 in the binary image and pixels in the original image with a value lower than TH_bin correspond to pixels with value 0 in the binary image.

However binarization using a global threshold is not an optimal solution. A major problem with global thresholding is that changes in illumination across the scene may cause some parts to be brighter (in the light) and some parts darker (in shadow) in ways that have nothing to do with the objects in the image.

Such uneven illumination can be handled by determining thresholds locally. That is, instead of having a single global threshold, we allow the threshold itself to smoothly vary across the image.

Local Thresholding

In the current invention, we use a local thresholding method, which uses local edge properties in a window to compute threshold.

Automatic Threshold Adaptation Unit (223 in FIG. 2)

The selection of the threshold in the Image Binarization unit is a very critical task, since it influences the content of the binary image and finally the precision of the detection system. Usually the value of this threshold changes with the content of the image or with the lighting conditions. Therefore the use of a constant (global or local) threshold, although an option, is not optimal. To this end an automatic threshold adaptation unit is included in the system described in the current invention. The system is able to adapt a global or local threshold according to the results of the detection process.

In an exemplary embodiment, a local thresholding approach is used which employs threshold adaptation using feedback from the system output and more specifically from the Digit Segmentation unit (229 in FIG. 2).

More specifically the unit functions as follows: For every frame I_(K) (71 in FIG. 7) an edge-map is obtained (76 in FIG. 7).

An edge map is defined as an image containing image edges. An image edge is a point in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The points at which image brightness changes sharply are typically organized into a set of curved line segments termed edges.

Edge detection is the process of obtaining the edge-map of an image. The detection process typically employs the filtering of an image by convolving a standard matrix known as an “operator” with the image. This filtering process results in an image having increased intensity for the pixels belonging to an edge and decreased intensity for pixels not-belonging to an edge. Usually as a final step, the binary edge map is obtained by applying binarization, using thresholding, to the edge-map image. This results in an image which has white pixels at the edges and black pixels everywhere else.

In an exemplary embodiment, the binary edge map (76 in FIG. 7) of frame I_(K) (71 in FIG. 7) is obtained by first applying an edge filtering using a Sobel operator (74 in FIG. 7)[1] and then binarization using thresholding (75 in FIG. 7). The threshold value for the binarization unit is obtained from the Threshold Trimming sub-system (751 in FIG. 7) described below.

The Threshold Trimming sub-system functions as follows: An arbitrary, pre-determined initial threshold value THRES_(—)1=THRES_(—)1_(IN1) is set. To be equal to the smaller integer number which is closest to the value 2^(Nb)/2, where Nb is the number of bits used to represent the pixel value (e.g for 8 bits representation this number equals 127). The plate detection and digit segmentation process is then run and when the plate detection and the digit segmentation process is finished, the number of detected digits is fed from the Digit Segmentation unit (229 in FIG. 2) and the required number of digits that must be detected is input from the Number of Digits Input unit (233 in FIG. 2). If the number of the detected digits is smaller than the required number of digits, the threshold value THRES_(—)1 is decreased and the detection is re-initiated. If the number of the detected digits is higher than the required number of digits, the threshold value THRES_(—)1 is increased and the detection is re-initiated. This process is repeated until the number of the detected digits is equal to the required number of digits.

Each threshold from Threshold Trimming sub-system is fed to the Thresholding I sub-system (75 in FIG. 7), which binarizes the edge map by using thresholding, to obtain the binary edge map J_(K) (76 in FIG. 7).

As a next step, the Input Frame I_(K) and the binary edge map E_(K) is partitioned into a number N_(BX)×N_(BY) blocks of dimensions w×w pixels each. Then for every from frame I_(K) the following procedure takes place iteratively for every frame I^(K) _(ij) (75 in FIG. 7):

The block I^(K) _(ij) is taken (75 in FIG. 7) and from the binary image E_(K) the corresponding block E_(ij) block is taken (78 in FIG. 7). For each of these blocks a binarization process is then taking place as follows: First the I^(K) _(ij) block (75 in FIG. 7) is multiplied (79 in FIG. 7) with the corresponding E^(K) _(ij) block (78 in FIG. 7) of binary Image E_(K). The resulting block D^(K) _(ij) (791 in FIG. 7) is a semi-binary image, containing pixels having the gray-scale value of the corresponding pixel in I^(K) _(ij) when the corresponding pixel in E^(K) _(ij) has a non-zero value (e.g. the pixel is on an edge) and zero everywhere else.

The next step is the binarization of this semi-binary block D^(K) _(ij) by applying a thresholding scheme (792 in FIG. 7), using a threshold calculated by the following formula:

THRES_(—)2=Σ_(x) ^(W)Σ_(y) ^(W) D _(ij) ^(xy)  (1)

, where D_(ij) ^(xy) is the pixel in x-th column and the y-th row of the D^(K) _(ij) block. The result is a binary version B^(K) _(ij) (793 in FIG. 7) of the block I^(K) _(ij) of the video frame I_(K).

Automatic Threshold Calculation Unit (232 in FIG. 2)

Alternative to automatic threshold adaptation, an automatic Threshold Calculation unit can be used. To this end a global threshold calculation algorithm can be used which can lead to acceptable performance.

There are a few automatic global threshold calculation approaches that can be used in this system [2]:

Algorithm of Ridler and Calvard, which optimizes the process of changing a gray-level image to a bimodal image, while retaining the best possible illumination of the image.

Algorithm of Otsu, which is a classical algorithm in image binarization. This algorithm transforms a gray-level image to a binary image for classifying foreground and background with a global threshold. This algorithm can be applied iteratively to a gray-scale histogram of an image for generating threshold candidates.

Algorithm of Pun proposes an optimal criterion for image thresholding. This criterion is corrected and improved by Kapur et al. which revised and improved Pun's algorithm by assuming two probability distributions for objects and background as well as maximizing the entropy of the image to obtain the optimal threshold.

Algorithm of Kittler and Illingworth, proposing a minimum error thresholding algorithm that minimizes the probability of classification error by fitting error expression. It is assumed that a mixture of two Gaussians distributions of object and background pixels can characterize the image.

Algorithm of Fan et al., proposing a fast entropic technique to obtain a global threshold automatically by reducing complexity in computation.

Algorithm of Portes de Albuquerque et al. proposing an entropic thresholding algorithm, which is customized from non-extensive Tsallis entropy concept.

Algorithm of Xiao et al. proposing an entropic thresholding algorithm based on the gray-level spatial correlation (GLSC) histogram. This is a revision and extension of Kapur et al.'s algorithm.

In one exemplary embodiment, the algorithm of Kapur has been selected for implementation [3]. This algorithm assumes two probability distributions, for objects p_(obg) (foreground) and background p_(bg), and maximizes the between-class entropy of the image to obtain the optimal threshold.

The between-class entropy of the threshold image is defined as:

$\begin{matrix} {{f_{1}({TH})} = {{H\left( {0,{TH}} \right)} + {H\left( {{TH},L} \right)}}} & (2) \\ {where} & \; \\ {{H\left( {0,{TH}} \right)} = {- {\sum\limits_{i = 1}^{TH}\; {\frac{p_{i}}{p_{obj}}\ln \frac{p_{i}}{p_{obj}}}}}} & (3) \\ {{H\left( {{TH},L} \right)} = {- {\sum\limits_{i = {{TH} + 1}}^{L}\; {\frac{p_{i}}{p_{bg}}\ln \frac{p_{i}}{p_{bg}}}}}} & (4) \\ {and} & \; \\ {p_{obj} = {- {\sum\limits_{i = 0}^{TH}\; p_{i}}}} & (5) \\ {p_{bg} = {1 - p_{obj}}} & (6) \end{matrix}$

p_(i) is the probability of a pixel value to appear in the current image and is defined as the ratio of the appearances of a value to the total number of pixels.

For bi-level thresholding, the optimal threshold is:

TH _(optimal)=ArgMax{f ₁(TH)}  (7)

In other words the optimal threshold value is the value of TH for which the quantity f₁ is maximized for each frame.

Threshold Input Unit (231 in FIG. 2)

This unit is an input unit, which can be used optionally to input a threshold value manually.

Morphological Filtering Unit (225 in FIG. 2)

In the presence of electronic noise, or physical obstacles (e.g. dust) the binarization process may result in binary noise. Binary noise manifests as white spots. These spots can cause a significant increase of the processing time. This is because the Connected Component Analysis unit (232 in FIG. 2) separately analyzes each non-black pixel to see if it is physically connected to any other pixel.

To overcome this problem, the Morphological Filtering unit cleans any isolated pixels in order to eliminate these pixels and to produce a more “clear” binary image.

The unit implements the following morphological operation: In each video frame (80 in FIG. 8) a 3×3 mask is formed (81 in FIG. 8) and starts rolling from the first pixel within a binary image from position (0,0) towards higher x and y coordinates.

For each window the number of black N_(b) and the number of white pixels N_(w) is counted. Then if N_(b)>N_(w) the central pixel of the 3×3 window is set to have black value (82 in FIG. 8) else the central pixel of the 3×3 window is set to have white value (83 in FIG. 8).

Connected Component Analysis Unit (226 in FIG. 2)

This unit aims at the labeling of the binary image regions using a connected components algorithm. The target is to label each object within the binary image and this incorporates the labeling of each pixel with a label. Pixels that are somehow connected are given the same label. At the end of this procedure, pixels with the same label corresponding to an object, having the same label as its constituting (labeled) pixels.

In an exemplary embodiment, a run-length based connect component algorithm is used [4], which is similar to the two-pass connected component algorithm [5], but here run-lengths are used rather than pixels resulting in a more efficient implementation in terms of computer memory and processing power.

The stages involved in this implementation are as follows:

1. Encoding pixels to runs (using run-length encoding);

2. Initial labeling and propagation of labels

3. Resolving of conflicts; and

4. Translating run labels to connected component.

Encoding Pixels to Runs (Using Run-Length Encoding),

In accordance with an exemplary embodiment of the current invention, a run-length encoding representation is followed for labeling. The run-length encoded format is also much more compact than a binary image (individual runs have a single label), and so the sequential label propagation stage that follows, is much faster than the conventional algorithm.

Run-length encoding works as follows: Consider the binary image frame (91 in FIG. 9). The target is to encode the contiguous foreground pixels (black colored), which, when working in rows, they are nothing else but black lines. For each line the starting pixel x-coordinate s, the end pixel coordinate e and the row r of that the line is recorded. For example line L₁ in FIG. 9 (92 in FIG. 9) starts at the first pixel of that row, so s=0, ends at the 5-th pixel of that row (thus e=4) and lies at the second row (thus r=1). Therefore this line is encoded as (0,4,1) and this code is also called a Run. The same procedure is followed for every line in the image. A run is complete when the end of a row is reached or when a background pixel is reached. The maximum possible number of runs in an image of size M×N is 2MN and the flow of the related algorithm is shown in FIG. 10.

Initial Labeling and Propagation of Labels

This stage involves initial labeling and propagation of labels (FIG. 11). The IDs and equivalences (EQs) of all runs are initialized to zero. This is followed by a raster scan of the runs; assigning provisional labels, which propagate to any adjacent, runs on the row below. For any unassigned run (IDi=0) a unique value is assigned to both its ID and EQ.

After that, the 4-way or 8-way connectivity is checked of each run. In 4-way connectivity, the adjacent pixels in four directions (up, down, left, right) are checked. If they are foreground pixels then are connected otherwise they are un-connected. Consider for example pixels P₃ (98 in FIG. 9) and P₄ (97 in FIG. 9). These are 4-way connected since pixel P₃ is on the left of pixel P₄.

In 8-way connectivity, the diagonal directions are also checked. Consider for example pixels P₁ (95 in FIG. 9) and P₂ (96 in FIG. 9), which are not 4-way connected to each other. However P₂ is in the diagonal direction of P₁, so P₁ and P₂ are 8-way connected.

For each Run with identity ID_(i) excluding Runs on the last row of the image, Runs R_(j) one row below the R_(i) is checked for a connection. In terms of run-length encoded lines, 4-way connection between two Runs R_(i),R_(j) means that the following conditions hold:

s _(i) ≦e _(j)  (8)

and

e _(i) ≧s _(j)  (9)

8-way connection between two Runs R_(i), R_(j) means that the following conditions hold:

s _(i) ≦e _(j)+1  (10)

and

e _(i)+1≧s _(j)  (11)

a connected run in the row below r_(i), is assigned the identity ID_(i), if and only if its ID, ID_(j) is unassigned. If there is a conflict (e.g. if an overlapping run has assigned ID_(j)), the equivalence of run I (the EQ_(i)) is set to IDj.

Resolving of Conflicts

The EQ and ID values should be equal. A differentiation between those two values for some run indicates the presence of some conflict, which occasionally happens when special shaped objects are encountered. Thus a third stage must be included for resolving those conflicts. For example this problem may be occurred when a ‘U’-shaped object is encountered. As shown in FIG. 12, applying labeling algorithm to the ‘U’-shaped object (123 in FIG. 12) will generate four runs R₁, R₂, R₃, R₄, each with unassigned ID and

The solution is a conflict-resolving algorithm, which follows a serial procedure, which scans all the runs sequentially, in the way shown in FIG. 13.

Translating Run Labels to Connected Component.

At the end of this procedure, each run has a label; so it is straightforward to obtain the final components, by simply gather the runs having the same labels.

Region Classification Unit (228 in FIG. 2)

The aim of this unit is to classify each region identified with the help of the CCA unit (227 in FIG. 2) and stored to the CCA data memory (25 in FIG. 2), in order to classify this region as a car-plate or not. To this end, several characteristic features of each region are measured. These features are forming then a vector characterizing this region and then are classified.

The region classification procedure includes two steps: The region feature extraction and the region classification.

Region Feature Extraction

Region feature extraction includes the measurement of several characteristic features of each region (142 in FIG. 14). The features that are measured are the following:

The width of the region: Width corresponds to the width of a rectangle surrounding the region under consideration (144 in FIG. 14). The width of the rectangle is computed as the difference of the maximum x coordinate minus the minimum x coordinate.

The area that the region occupies: This is the area occupied by a rectangle surrounding the region under consideration (144 in FIG. 14) measured in square pixels. The width of the rectangle is computed as the difference of the maximum x coordinate minus the minimum x coordinate, and the height of the rectangle is computed as the difference of the maximum y coordinate minus the minimum y coordinate. The area equals the product of width by the height of the rectangle.

The magnitude of the region: This is the count of the non-white pixels N_(NW), of the connected region and is measured in pixels.

The plenitude of a region: This measure indicates how full the region under consideration is. For example a region containing gaps will have less plenitude in relation with a region not having gaps. The plenitude of a region is defined as the ratio of the area to the magnitude features defined above.

The aspect ratio of a rectangle surrounding the region under consideration (143 in FIG. 14): The region under test is surrounded by a rectangle. The ratio of this rectangles height to the rectangles width gives the aspect ratio of that region.

Number of Scan-lines intersection points: Several “virtual” lines of 1-pixel thickness are considered that intersect the region at different heights (144 in FIG. 14). The system records the number of pixels that each scan line meets in each track throughout the region and produces a feature vector FV_(SL) of cardinality N_(SL), where N_(SL) is equal to the number of the scan lines and contains the ID of each scan-line and the number of pixels that this line intersects. As an example, consider the scan lines indicated in FIG. 14, (144 in FIG. 14). Since the first line intersects with 2 pixels, the second line with 3 pixels and the third line with three pixels, then this future vector is FV_(SL)={1,2,2,3,3,3}.

Statistical normalized central moments (Hue Moments). Statistical manipulation of the pixels and their coordinates within a region result in the formation of a set of region-specific features called statistical moments [6]. Central moments are given by the following expression:

μ_(pq)=Σ_(x)Σ_(y)(x− x )^(p)(y− y )^(q)  (12)

In Eq. (12) x, y are the x and y coordinates of each pixel in the region and x, y are the mean values of all x and all y coordinates respectively for each non-white pixel within this region. Integer numbers p and q, determine the order of a statistical moment. Combinations of low order statistical moments (up to the order of 2 e.g. μ₀₂ to μ₁₁), represent some physical measure of the region as the mean, the mass-center, the skewness, the angle with the x-axis etc. For example, the angle of a region with the horizontal x-axis is given by the following expression:

$\begin{matrix} {\theta = {\arg \; {\tan \left( \frac{2m_{11}}{m_{20} - m_{02}} \right)}}} & (13) \end{matrix}$

In an exemplary embodiment, the calculation of these statistical moments is performed in the encoded space and on the run-length encoded runs. As it has been described above, each run is described by three numbers namely s_(i), e_(i), r_(i), which indicate the start and the end on the x-direction as well as the row of each non-white pixel within the region under consideration. If this type of description is used, eq. 12 cannot be directly applied, since the coordinates of each pixel in the region under consideration is not available. To this end eqn. 12 should be modified accordingly. Below, this modification of the central moments is given for order up to 3 (p+q≦3).

$\begin{matrix} {\mspace{79mu} {\mu_{11} = {{\frac{1}{N_{NW}}{\sum\limits_{i}\; {{r_{i}\left( \frac{s_{i} + e_{i}}{2} \right)}\left( {e_{i} - s_{i} + 1} \right)}}} - \overset{\_}{y}}}} & (14) \\ {\mu_{20} = {{\frac{1}{N_{NW}}{\sum\limits_{i}\; {\left( \frac{e_{i} - s_{i} + 1}{6} \right)\left\lbrack {\left( {e_{i} + s_{i}} \right)^{2} + {e_{i}\left( {e_{i} + 1} \right)} + {s_{i}\left( {s_{i} - 1} \right)}} \right\rbrack}}} - {\overset{\_}{x}}^{2}}} & (15) \\ {\mspace{79mu} {\mu_{02} = {{\frac{1}{N_{NW}}{\sum\limits_{i}\; {r_{i}^{2}\left( {e_{i} - s_{i} + 1} \right)}}} - {\overset{\_}{y}}^{2}}}} & (16) \\ {\mspace{79mu} {\mu_{12} = {{\frac{1}{N_{NW}}{\sum\limits_{i}{{r_{i}^{2}\left( \frac{s_{i} + e_{i}}{2} \right)}\left( {e_{i} - s_{i} + 1} \right)}}} - {2\overset{\_}{y}\; \mu_{11}} - {\overset{\_}{x}\; \mu_{02}} + {\overset{\_}{x\;}{\overset{\_}{y}}^{2}}}}} & (17) \\ {\mu_{21} = {{\frac{1}{N_{NW}}{\sum\limits_{i}{{r_{i}\left( \frac{e_{i} - s_{i} + 1}{6} \right)}\left\lbrack {\left( {e_{i} + s_{i}} \right)^{2} + {e_{i}\left( {e_{i} + 1} \right)} + {s_{i}\left( {s_{i} - 1} \right)}} \right\rbrack}}} - {\overset{\_}{y}\; \mu_{20}} - {2\overset{\_}{x}\; \mu_{11}} - {{\overset{\_}{x\;}}^{2}\overset{\_}{y}}}} & (18) \\ {\mspace{79mu} {\mu_{03} = {{\frac{1}{N_{NW}}{\sum\limits_{i}\; {r_{i}^{3}\left( {e_{i} - s_{i} + 1} \right)}}} - {3\overset{\_}{y\;}\mu_{02}} - {\overset{\_}{y}}^{3}}}} & (19) \\ {{\mu_{30} = {{\frac{1}{N_{NW}}{\sum\limits_{i}{\left( \frac{e_{i} - s_{i} + 1}{4} \right)\left\lbrack {e_{i}^{3} - {2s_{i}^{3}} + {s_{i}^{2}\left( {e_{i} - 1} \right)} + {e_{i}\left( {s_{i} + 1} \right)}} \right\rbrack}}} - {3\overset{\_}{x}\; \mu_{20}} - {\overset{\_}{x}}^{3}}},} & (20) \\ {\mspace{79mu} {where}} & \; \\ {\mspace{79mu} {\overset{\_}{x} = {\frac{1}{N_{NW}}{\sum\limits_{i}{{r_{i}\left( \frac{s_{i} + e_{i}}{2} \right)}\left( {e_{i} - s_{i} + 1} \right)}}}}} & (21) \\ {\mspace{85mu} {\overset{\_}{y} = {\frac{1}{N_{NW}}{\sum\limits_{i}{r_{i}\left( {e_{i} - s_{i} + 1} \right)}}}}} & (22) \end{matrix}$

One interesting modification of these moments, results when the central moments are normalized used following relation:

$\begin{matrix} {{n_{pq} = \frac{\mu_{pq}}{\mu_{00}^{\gamma}}},{{{where}\mspace{14mu} \gamma} = {\frac{p + q}{2} + 1}}} & (23) \end{matrix}$

By using these normalized central moments, a new set of statistical moments can be formed, known as the Hu moments I_(i), given by the following relations

I ₁ =n ₂₀ +n ₀₂  (24)

I ₂=(n ₂₀ −n ₀₂)²+4n ₁₁ ²  (25)

I ₃=(n ₃₀−3n ₁₂)²(3n ₂₁ −n ₀₃)²  (26)

I ₄=(n ₃₀ +n ₁₂)²+(n ₂₁ +n ₀₃)²  (27)

I ₅=(n ₃₀−3n ₁₂)(n ₃₀ +n ₁₂)[(n ₃₀ +n ₁₂)²−3(n ₂₁ +n ₀₃)²]+(3n ₂₁ −n ₀₃)(n ₂₁ +n ₀₃)[3(n ₃₀ +n ₁₂)²−(n ₂₁ +n ₀₃)²]  (28)

I ₆=(n ₂₀ −n ₀₂)[(n ₃₀ +n ₁₂)²−(n ₂₁ +n ₀₃)²]+4n ₁₁(n ₃₀ +n ₁₂)(n ₂₁ +n ₀₃)  (29)

I ₇=(3n ₂₁ −n ₀₃)(n ₃₀ +n ₁₂)[(n ₃₀ +n ₁₂)²−3(n ₂₁ +n ₀₃)²]−(n ₃₀−3n ₁₂)(n ₂₁ +n ₀₃)[3(n ₃₀ +n ₁₂)²−(n ₂₁ +n ₀₃)²]  (30)

In a different implementation the run-length encoded region under consideration, is first decoded in order to obtain the initial binary image corresponding to this region. In this case, equation 12 is applied directly. The procedure that is followed in order to do this is analyzed below, in the description of the digit segmentation unit.

The feature vector FV_(HM)={I₁, I₂, I₃, I₄, I₅, I₆, I₇} resulting from this set of features contains up to 7 numbers corresponding to the 7 Hu moments I₁ to I₇ as described in Eqs. 24-30.

Region Classification

The region classification aims to the classification of each region under consideration as a car-plate or not, using also input from the Classification Criteria Trimming unit (227 in FIG. 2).

In implementing an exemplary embodiment, a pattern classification scheme is used for region classification. To this end, the system has been previously trained offline, using a database with regions corresponding to plates and with regions corresponding to non-plates. For each region, the features described in the previous section are evaluated and a total feature vector is formed. The feature vector is then projected in the feature space, defined as a multi-dimensional space with as many dimensions as the feature vector. In such a projection, the feature vectors corresponding to plate and non-plate regions are concentrated (clustered) in separate areas of the multi-dimensional feature space. Consider the example shown in FIG. 15 incorporating a 3-dimensional feature vector FV={FV₁,FV₂,FV₃}, which builds a 3 dimensional feature space (151 in FIG. 15). Each point in this space is defined by the three coordinates FV₁,FV₂,FV₃. The projection of the several regions on this axis-system creates two clusters one for the regions corresponding to plates (153 in FIG. 15) and one for the regions not corresponding to plates (152 in FIG. 15).

The next step is to define the centers of the individual clusters. In accordance with one exemplary embodiment, this is achieved via the calculation of the center of mass of each cluster. The center of mass has coordinates FV _(C)={ FV ₁, FV ₂, . . . , FV _(D)} where D is the dimensionality of the feature space, and each coordinate FV _(k) is defined as:

$\begin{matrix} {{\overset{\_}{FV}}_{k} = {\frac{1}{N_{NS}}{\sum\limits_{i}{FV}_{ki}}}} & (31) \end{matrix}$

where N_(S) is the number of samples (regions) participating in each cluster. In the 3-dimensional example referred before, the centers of the clusters are indicated as C1 (156 in FIGS. 15) and C2 (157 in FIG. 15).

When a new region T is tested, its feature vector FV_(T) is obtained. This corresponds to a point in the feature space. In order to test into which cluster this test point belongs, the distance of this point from the centers of the clusters is computed using some distance measure such as the L1 distance, L2 distance, the Mahalanobis distance etc.

In one exemplary embodiment, the L2 distance is used which is defined as follows: in Cartesian coordinates, if p=(p₁, p₂, . . . , p_(n)) and q=(q₁, q₂, . . . , q_(n)) are two points in Euclidean n-space, then the L2 or Euclidean distance from p to q, or from q to p is given by the following expression:

d(p,q)=d(q,p)=√{square root over (ΣI _(i=1) ^(n)(q _(i) −p _(i))²)}  (32)

In the 3-dimensional example of FIG. 15, the distance of the test point T (155 in FIG. 15) from the cluster-center C1 (152 in FIG. 15) is d1 (158 in FIG. 15) and from the cluster-center C2 (157 in FIG. 15) is d2 (154 in FIG. 15).

Once the distances of the test point from the centers of the clusters are computed, the decision about into which cluster this point belongs to is taken according a proximity criterion. That is, the point belongs to the nearest cluster according to the distance measure used. Once this decision has been made, the region under test has been classified as plate or non-plate.

While the above description utilizes a specific classifier, it is understood that an Artificial Neural Network classifier or any other type of classifier can be used.

An alternative to pattern classification, is the feature filtering implementation. In this scheme, a region can be classified as plate or non-plate according to some empirical measures corresponding to physical properties of each region, or some empirical observations.

To this end the features of width, magnitude, aspect ratio, plenitude, scan-lines and the angle with the x-axis (eq. 13) are used. The target is a formation of a decision vector as follows:

Each of the above-mentioned features is checked against a target value or a range of target values rule (TABLE 1), which are in turn obtained from empirical observations or from governmental standards. These rules are input from the Classification Criteria Trimming unit (227 in FIG. 2).

Conformance to the target value corresponds to a true indication and a non-conformance to the target value corresponds to a false indication. To this end a binary decision vector DV is obtained as follows:

DV={D _(width) _(—) _(rule) ,D _(magnitude) _(—) _(rule) ,D _(aspect) _(—) _(ratio) _(—) _(rule) ,D _(plenitude) _(—) _(rule) ,D _(scan) _(—) _(lines) _(—) _(rule) ,D _(angle) _(—) _(rule)}

TABLE 1 Feature Target Value Rule (example) Width >100 AND <300 Magnitude >1000 AND <5000 Aspect ratio >3 AND <5 Plenitude >0.5 AND <0.9 Scan-lines  >5 AND <12 Angle <5

A simple approach is to classify the region as a plate if and only if all the logic vector containing logic ones, meaning that the all the feature values conforming to the target values.

However in the current implementation a decision fusion rule is formed leading to optimal results. This fusion rule is the following

FR={[D _(width) _(—) _(rule) AND D _(aspect) _(—) _(ratio) _(—) _(rule) AND D _(angle) _(—) _(rule)] OR [D _(plenitude) _(—) _(rule) AND D _(scan-lines) _(—) _(rule)]}

If FR is TRUE then the region is classified as a plate, while if FR is FALSE the region is classified as a non-plate.

The target value rules can be change when is needed (e.g. the system need to be trimmed for a different country) through the Classification Criteria Trimming unit (227 in FIG. 2).

Classification Criteria Trimming Unit (227 in FIG. 2)

This unit is used for input target value rules to the region classification unit (228 in FIG. 2)

Plate Output Unit (234 in FIG. 2)

The aim of this unit is to output the coordinates of each region classified as a car plate. The unit outputs the plate if and only if the Automatic Threshold Adaptation unit (223 in FIG. 2), indicate that the right number of digits have been detected.

Digit Segmentation Unit (229 in FIG. 2)

The aim of this unit is to segment the individual digits constituting a car-plate in order to be able to be output from the system in binary form to an Optical Character Recognition (OCR) system.

The digits in a binary plate image appear as coherent regions (161 in FIG. 16). Therefore the unit performs a CCA analysis similar with the analysis performed in CCA unit (226 in FIG. 2). However on top of the plate digits, the plate image usually contains additional regions corresponding to e.g. the plate border-line (163 in FIG. 16), separation and state signs (166 in FIG. 16), noise (162 in FIG. 16) etc. To this end an additional filtering scheme is applied in order to filter-out any regions not corresponding to digits. This filtering scheme includes the computation of a simple feature and checking this feature against a target value rule.

The CCA analysis performed in this unit follows steps 2 and 3 of the CCA analysis performing the CCA unit, leaded by an extra step, which is the background-foreground inversion. In the first CCA analysis, the digits of the plate appear as white holes (background), since the digits are usually black. To this end they are not run-length encoded and thus information about them cannot be extracted. To this end a background-foreground inversion must be carried out for the regions detected as plates using a procedure, which for a region containing N runs is shown in FIG. 17.

Once the background-foreground inversion has been carried out, the new inverted runs must be de-coded in binary image format (pixels coordinates and values). This process is straightforward and incorporates the use of a structured image memory, which is loaded with pixels values at coordinates indicated by the run-length code. Analytically the process followed in the current implementation for a region containing N runs, is shown in FIG. 18.

Digit Output Unit (230 in FIG. 2)

The aim of this unit is to output the digits to the system output when the Automatic Threshold Adaptation unit (223 in FIG. 2) indicates that the right number of digits has been detected.

The systems, methods and techniques described herein performed or implemented on any device that comprises at least one camera, including but not limited to, standalone cameras, security cameras, smart cameras, industrial cameras, mobile phones, tablet computers, laptop computers smart TV sets and car boxes, i.e. a device embedded or installed in an automobile that collects video and images. It will be understood and is appreciated by persons skilled in the art, that one or more processes, sub-processes or process steps described in embodiments of the present invention can be implemented in hardware and/or software.

While the above-described flowcharts and methods have been discussed in relation to a particular sequence of events, it should be appreciated that changes to this sequence can occur without materially effecting the operation of the invention. Additionally, the exemplary techniques illustrated herein are not limited to the specifically illustrated embodiments but can also be utilized and combined with the other exemplary embodiments and each described feature is individually and separately claimable.

Additionally, the systems, methods and protocols of this invention can be implemented on a special purpose computer, a programmed microprocessor or microcontroller and peripheral integrated circuit element(s), an ASIC or other integrated circuit, a digital signal processor, a hard-wired electronic or logic circuit such as discrete element circuit, a programmable logic device such as PLD, PLA, FPGA, PAL, any comparable means, or the like. In general, any device capable of implementing (or configurable to implement) a state machine that is in turn capable of implementing (or configurable to implement) the methodology illustrated herein can be used to implement the various methods, protocols and techniques according to this invention.

Furthermore, the disclosed methods may be readily implemented in software using object or object-oriented software development environments that provide portable source code that can be used on a variety of computer or workstation platforms. Alternatively, the disclosed system may be implemented partially or fully in hardware using standard logic circuits or VLSI design. Whether software or hardware is used to implement the systems in accordance with this invention is dependent on the speed and/or efficiency requirements of the system, the particular function, and the particular software or hardware systems or microprocessor or microcomputer systems being utilized. The systems and methods illustrated herein can be readily implemented in hardware and/or software using any known or later developed systems or structures, devices and/or software by those of ordinary skill in the applicable art from the functional description provided herein and with a general basic knowledge of the video processing arts.

Moreover, the disclosed methods may be readily implemented in software that can be stored on a storage medium, executed on programmed general-purpose computer with the cooperation of a controller and memory, a special purpose computer, a microprocessor, or the like. In these instances, the systems and methods of this invention can be implemented as program embedded on personal computer such as an applet, JAVA™ or CGI script, as a resource residing on a server or computer workstation, as a routine embedded in a dedicated system or system component, or the like. The system can also be implemented by physically incorporating the system and/or method into a software and/or hardware system, such as the hardware and software systems of an electronic device.

It is therefore apparent that there has been provided, in accordance with the present invention, systems and methods for the detection of multiple number-plates of moving vehicles. While this invention has been described in conjunction with a number of embodiments, it is evident that many alternatives, modifications and variations would be or are apparent to those of ordinary skill in the applicable arts. Accordingly, it is intended to embrace all such alternatives, modifications, equivalents and variations that are within the spirit and scope of this invention.

REFERENCES

(All of which are incorporated herein by reference in their entirety)

-   1. Sobel operator, Wikipedia,     http://en.wikipedia.org/wiki/Sobel_operator -   2. M. Athimethphat, “A Review on Global Binarization Algorithms for     Degraded Document Images”, AU J. T. 14(3): 188-195 (January 2011). -   3. J. N. Kapur et all. “A new method for gray-level picture     thresholding using the entropy of the histogram”, Computer Vision,     Graphics and Image Processing, 29, 273-285, 1985. -   4. Kofi Appiaha, Andrew Huntera, Hongying Menga, Patrick Dickinson,     “Accelerated hardware object extraction and labeling: from object     segmentation to connected components labeling”, Preprint submitted     to Computer Vision and Image Understanding Aug. 22, 2009 -   5. N. Ma, D. G. Bailey, and C. T. Johnston, “Optimized single pass     connected components analysis” IEEE International Conference on     Field-Programmable Technology, 2008. -   6. R. C. Gonzalez, R. E. Woods, “Digital Image Processing”, pages:     514-516, Addison-Wesley, 1993. 

What is claimed is:
 1. A system that detects car plates, including a camera, capable of detecting a moving car by analyzing video frames captured by said camera and identifying that portion of at least one video frame that corresponds to the moving car.
 2. The system of claim 1, wherein the analysis of the at least one video frame identifies a background model.
 3. The system of claim 2, wherein the background model is calculated using statistical techniques.
 4. A system to detect car plates, which utilizes video frames captured by a camera, wherein subsets of pixels within one of said video frames are set to one of two values based upon a threshold determined by an automatic binarization technique.
 5. The system of claim 4, wherein the automatic binarization technique utilizes a threshold that depends upon the content of at least one of said video frame.
 6. A system, which is capable of identifying car plates from one or more images captured by a camera, that utilizes region classification to determine whether a region within the one or more images is a car plate or is not a car plate; wherein said region classification is based upon pattern recognition with morphological features and wherein said morphological features are calculated in the run-length encoded domain.
 7. The system of claim 6, wherein the region classification is based on empirical target rules.
 8. The system of claim 6, wherein the region classification is based on empirical target rules and uses an optimal decision fusion rule. 